Identification of related gene/protein names based on an HMM of name variations

نویسندگان

  • Lana Yeganova
  • Lawrence H. Smith
  • W. John Wilbur
چکیده

Gene and protein names follow few, if any, true naming conventions and are subject to great variation in different occurrences of the same name. This gives rise to two important problems in natural language processing. First, can one locate the names of genes or proteins in free text, and second, can one determine when two names denote the same gene or protein? The first of these problems is a special case of the problem of named entity recognition, while the second is a special case of the problem of automatic term recognition (ATR). We study the second problem, that of gene or protein name variation. Here we describe a system which, given a query gene or protein name, identifies related gene or protein names in a large list. The system is based on a dynamic programming algorithm for sequence alignment in which the mutation matrix is allowed to vary under the control of a fully trainable hidden Markov model.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

سیستم شناسایی و طبقه بندی اسامی در متون فارسی

Name entity recognition (NER) is a system that can identify one or more kinds of names in a text and classify them into specified categories. These categories can be name of people, organizations, companies, places (country, city, street, etc.), time related to names (date and time), financial values, percentages, etc. Although during the past decade a lot of researches has been done on NER in ...

متن کامل

The Place-Name as an Intangible Place of Memory (A Holistic Approach in Reading the Place-Names through a Comparative-Analytical Study on the Character of Name and Place)

Understanding architectural heritage and their various aspects have always been a subject of focus for the international conservation communities. Within the recent decades, eventhough the place-names are part of the living history as well as cultural heritage, they have still constantly been facing quick precipitant changes. As such, in the Conservation literature, most studies have skipped ad...

متن کامل

Identification and prioritization genes related to Hypercholesterolemia QTLs using gene ontology and protein interaction networks

Gene identification represents the first step to a better understanding of the physiological role of the underlying protein and disease pathways, which in turn serves as a starting point for developing therapeutic interventions. Familial hypercholesterolemia is a hereditary metabolic disorder characterized by high low-density lipoprotein cholesterol levels. Hypercholesterolemia is a quantitativ...

متن کامل

Throne Name in the Achaemenid period

The Achaemenid kings after Darius I elected Darius, Xerxes, and Artaxerxes as their throne name, when they were nominating or substituting for succession. Each of these kings has chosen one of these names according to what happen for they before they reached the king's throne, how to achieve the throne and based on their design and program. These names are not personal and real names, but they ...

متن کامل

Gene/Protein/Family Name Recognition In Biomedical Literature

Rapid advances in the biomedical field have resulted in the accumulation of numerous experimental results, mainly in text form. To extract knowledge from biomedical papers, or use the information they contain to interpret experimental results, requires improved techniques for retrieving information from the biomedical literature. In many cases, since the information is required in gene units, r...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Computational biology and chemistry

دوره 28 2  شماره 

صفحات  -

تاریخ انتشار 2004